Entropy loss in long-distance DNA looping 
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The entropy loss due to the formation of one or multiple loops in circular and linear DNA chains is 
calculated from a scaling approach in the limit of long chain segments. The analytical results allow 
to obtain a fast estimate for the entropy loss for a given configuration. Numerical values obtained for 
some examples suggest that the entropy loss encountered in loop closure in typical genetic switches 
may become a relevant factor which has to be overcome by the released bond energy between the 
looping contact sites. 
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I. INTRODUCTION 

Gene expression in all organisms comprises the tran- 
scription of a certain gene on the DNA into messenger 
RNA (mRNA) through RNA polymerase starting from 
the promoter site, and its subsequent translation into a 
protein. The initiation of the transcription at a specific 
gene underlies a subtle cooperative scheme of transcrip- 
tion factors, which in turn is determined by a given set 
of boundary conditions such as the concentration of the 
transcription factors. Transcription factors often act co- 
operatively, and they are known to interact with each 
other over distances of several thousand base pairs (bp) . 
This interaction is effected through DNA looping, com- 
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Figure 1: DNA looping in a circular (left) and linear DNA 
(right). The rounded boxes indicate the chemical bonds es- 
tablished between the transcription factors through looping at 
specific contact sites on the DNA double-helix which are fairly 
distant from one another in terms of the arc length along the 
DNA. A telomere loop corresponds to the right configuration 
with vanishing t\ [or £2]. 
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A typical example for DNA looping is found in the 
genetic switch which determines whether the replication 
of bacteriophage A in E.coli follows either the lysogenic 
or the lytic pathway [Q, |[. A key component of this A 
switch is the A repressor which activates the expression 
of a gene that encodes the production of the A repressor 
itself. A repressor can bind to the three operator sites 
Or which overlap the two promoter sites of the switch. 
A repressor binds cooperatively as a dimer, and typically 
under stable lysogenic conditions two such dimers on Or 
form a tetramer, the next higher order of cooperativ- 
ity, which is the main factor for the stability of the A 
switch against noise However, A repressor can also 
bind to the very similar operator Ol, which is located 
roughly 2300 bp away and not part of the A switch. It has 
been found that the two A repressor tetramers at Ol and 
Or synergistically form an octamer through DNA loop- 
ing. This higher-ordered oligomerisation enhances the 
performance of the switch considerably || ||, ||, [j], ||, |) . 
The specific binding along the tetramer-tetramer inter- 
face has recently been revealed through crystallographic 
structure determination || . Similar realisations of DNA 
looping also occur in linear DNA, naturally in the form 
of telomeres or in vitro in engineered DNA, compare 
Fig. [l] 0, |ll| . Multiple looping in large DNA molecules 
around a locus can be observed in vivo and can be in- 
duced in vitro by introducing of specific binding zones on 
the DNA, which leads to a considerable reduction of the 
gyration radius of the molecule such that it can be more 
easily transferred into (e.g., mammalian) cells p"2| . 

DNA looping often involves large loop sizes of several 
thousand bp. Therefore, the formation of these loops 
causes a non-negligible entropy loss which has to be over- 
come by the binding energy released at the bond forma- 
tion on loop closure. In the present study, we quantify 
this entropy loss for such long DNA loops, taking into 
account self-avoiding effects due to both the monomer- 
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monomer interaction within the loop and the additional 
effects due to the higher order contact points (vertices) at 
the loop closure site. The resulting numbers for typical 
systems suggest that the entropy loss is a relevant fac- 
tor in the formation of DNA loops, and it gives a lower 
bound for the bond-forming energy required to stabilise 
the loop. 

Entropy loss due to loop formation was studied for the 
case of disconnected loops by Schellmann and Flory [jl3| . 
In their seminal paper, Poland and Scheraga |l4j consid- 
ered coupled Gaussian loops. To our knowledge the full 
effect of self-avoidance in the DNA looping network has 
not been considered before. Hereby, the contributions 
of non-trivial vertices turns out to be a relevant factor, 
and for multiple looping with a common locus actually 
become the dominating contribution. The analytical re- 
sults presented here are derived from a scaling approach 
for general polymer networks and provide the advantage 
that on their basis estimates for the entropy loss in a 
given DNA system can be computed in a straightforward 
manner. It should also be noted that the additional ver- 
tex effects studied herein may be crucial in the analytical 
treatment of the DNA looping dynamics, as the higher- 
order self-interaction at such vertices poses an additional 
barrier in the loop closure process p5| . Our results for 
long DNA with large loops complement the investiga- 
tions of the bending and twisting energies in small DNA 
plasmids |Tq] . In the case of intermediate sized DNA 
segments, both approaches may be combined. 

In what follows, we calculate the scaling results for the 
entropy loss on looping for the three different cases: (i) 
looping in a circular DNA, (ii) looping in a linear DNA, 
and (hi) multiple looping in a circular DNA. In the ap- 
pendix, the general expressions for calculating the system 
entropy of an arbitrary polymer network are compiled so 
that the entropy loss for different configurations can be 
calculated according to the general procedure developed 
below. 



II. LOOPING IN A CIRCULAR DNA CHAIN 

As stated before, we consider the limit in which each 
segment of the looped DNA, e.g., both subloops created 
in the circular DNA upon looping, are long in compar- 
ison to the persistence length £ p of the double-stranded 
DNA chain fl7| . In this long chain limit, we can neglect 
energetic effects due to bending or twisting, such that we 
treat the DNA as a flexible self-avoiding polymer. There- 
fore, we can employ results for the configuration number 
of a general polymer network, which we briefly review in 
the appendix. 

Before looping, the free energy of the circular DNA of 
total length L is given by 



-Fcirc — Hq — TS C1T 



(1) 



where Hq combines all binding enthalpies in the macro- 
molecule and the entropy .Scire = fcs lnw c irc is determined 



by the number of configurations |I8fl (see the appendix) 

Wcirc — ^circ fJ- L 3 (2) 

of a simply connected ring polymer of length L in units 
of the monomer length. The latter can be estimated by 
the persistence length £ p of the polymer (about 500 A for 
double-stranded DNA corresponding to 100 bp [^9|). In 
equation (^), A c - lrc is a non-universal amplitude, /j, is the 
support dependent connectivity constant, and v ~ 0.588 
is the Flory exponent. Thus, S C \ TC has the form 



Scire = k B (In Acirc + L ln/x - 3i/ In L) 



(3) 



On looping, as sketched in Fig. [t]to the left, the circular 
DNA is divided into two subloops of lengths £ and L — £ 
by creation of a vertex at which four legs of the chain are 
bound together. For a self-avoiding chain, the number of 
configurations of the resulting "figure-eight" shape |||] 
is not simply the product of the configuration numbers 
of the two created loops, but has the more complicated 
form |22], 
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u 8 = A 8 v L (L-£) 



L-l 



(4) 



In this expression, A% is a non-universal amplitude, ^8 
is an universal scaling function, and 04 ~ —0.48 is an 
universal exponent associated with the vertex with four 
outgoing legs. Note that in the Gaussian chain limit, the 
exponents <jn vanish; as we are going to show, the in- 
clusion of the additional effects due to the higher order 
vertex formation reflected by nonzero values for er/v are 
non-negligible. Given the entropy Sg — /c^lna^ of the 
figure eight configuration, the entropy loss suffered from 
creating this configuration out of the original circular 
DNA amounts to \S& — Scire |- To proceed, we now evalu- 
ate the scaling function y s (%) m some special cases, and 
calculate typical numbers for the required entropy loss 
compensation. Two limiting cases can be distinguished: 

(1.) If one of the loop sizes is much smaller than the 
other (f « i - !, say), the big loop of size L — £ will 
essentially behave like a free circular chain so that its 
contribution to will scale like a regular ring polymer, 
i.e., like (L — £)~ Zv . Consequently, we find the behaviour 
y%{x) = air~ 3l/+cr4 for i^l, where a is an universal 
amplitude, and therefore [^T[ p5| 



A 8 afi L (L~£)' 3,J £- 3u+ai 



(5) 



In this case, the free energy difference between the initial 
circular and the looped states becomes 



AF = Aff, 



bond 



(6) 



where Ai/bond is the binding enthalpy at the loop clo- 
sure site. The formation of the looping bond has to re- 
lease a higher enthalpy than what is lost in entropy, i.e., 
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Affbond < T(S$ — S c i rc ). Collecting the different expres- 
sions, we thus find the condition 



< k B T 



In 



^8 a 
A ■ 



3z/ln 



£[L - £) 



1T4 ln£ 



(7) 

In this expression (and in similar expressions below), the 
first term in the square brackets is non-universal and de- 
pends on details of the model pq] , whereas the remaining 
contributions are universal (apart from the fact that L is 
measured in units of the non-universal monomer length). 

To get an estimate for the magnitude of the en- 
tropy loss, consider the case of the A repressor loop in 
E.coli. With the size of the entire DNA of approxi- 
mately 3.5 x 10 3 kbp and the looping branch of about 
2.3 kbp, the two loops correspond to 3.5 x 10 4 and 23 
monomers, respectively (each monomer corresponds to a 
persistence length £ p of 100 bp, see above). Neglecting 
the non-universal first term in brackets in expression (j?]) 
p6j , these numbers produce 

Affbond < -7.0 k B T = -17.5 kJ/mol = -4.2 kcal/mol; 

(8) 

here and in the following examples we choose T = 300°K 
and make use of the gas constant, R = 8.31 JK _1 mol~ , 
and the conversion factor 1 cal = 4.2 J p7| ]. Expression 
(||) gives a considerable minimal value for the required 
bond energy between the two looping sites. For com- 
parison, the typical free energy for base pair formation 
in DNA is 8 kcal/mol for AT pairs and 13 kcal/mol for 
GC pairs |2^]. Thus, even for the relatively small loop 
of 23 monomers, the required enthalpy release is non- 
negligible. Note that the relative contribution stemming 
from the 04 term in equation (Q) amounts to about 20% 
of the required enthalpy. 

(2.) If the two created loops are of comparable size, 
i.e., x = £/{L — £) « 1, the corresponding value of the 
scaling function J^g (x) is a finite number. For example, 
for £ — L/2 one finds 
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distinct configurations, where An n is a non- universal am- 
plitude and 7 ~ 1.16 is an universal exponent ETA pTj[ . If 
looping occurs and produces the A-shape in Fig. |l] to the 
right, the configuration number is modified to 

u JA = A A ^(L-£y-'-^y A ^ T ^- f ^ , (12) 

where I is the size of the loop and l\ , £2 are the sizes of 
the two loose end-segments, respectively. 

We distinguish four different cases belonging to two 
groups: the configuration with £\ w £2, and the telomere 
configuration for which £\ = (or £2 = 0). 

(1.) If £ x = £ 2 , we find 

uj a = A A fi L (L - Wa (j^j , (13) 

where W A (x) = y A (x, 1). If furthermore £ <C L — £, an 
analogous reasoning as in case 11.(1.) leads to 



lu a = A A bfi L (L-£)~<- H 



7 — 1 p— 3v+a-4, 



(14) 



where b is an universal number. The fact that £ carries 
the same exponent as in the above case, number (1.), 
is due to the local effect of self-interaction for the small 
loop; in both cases, the small loop is connected to a 4- 
vertex. 

For the binding enthalpy we obtain the condition 



In- — +(7-1) In— — 



(3^—174) h\£ 



( 15 ) 

To obtain a numerical value, consider the A repressor 
loop of 23 monomers and the E.coli DNA length 3.5 x 
10 4 monomers, a configuration which can be obtained by 
cutting the E.coli DNA. Neglecting the (non-universal) 
first term in the square brackets, we find in this case 

Affbond < -7.0 k B T = -17.5 kJ/mol = -4.2 kcal/mol, 

where the exact numerical value is slightly smaller than 
in equation (|^). 

(2.) Conversely, if I = i\ — £2, the simpler expression 



In a modified DNA with two loops of 2.3 kbp each, one 
finds a bond enthalpy requirement of 

Affbond < -5.8 k B T = -14.5 kJ/mol = -3.4 kcal/mol, 

( 10 ) 

where we again neglect the non-universal first term in 
the square brackets [^6|. If both loops are of size 2 x 
10 3 kbp each, the required bond enthalpy would increase 
to Affbond < —12.5 kcal/mol. 



III. LOOPING IN A LINEAR DNA CHAIN 



A linear chain of length L can assume 



(11) 
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7 — 1— 3I/+CT4 



emanates, and the binding enthalpy has to fulfil 

A A W A {\) 



Aff, 



bond 



< k B T 



In- 



lin 



2 2L 
+(7 -1) In-- (3i/- O In — 



(17) 



(18) 



Taking 23 monomers for each segment and neglecting the 
(non-universal) first term in the square brackets yields 
the condition 

Affbond < -8.6 k B T = -21.6 kJ/mol = -5.1 kcal/mol 

(19) 
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for the binding energy. If the segments are larger by 
a factor of 100, this value gets modified to AiJbond < 
— 11.3 kcal/mol. 

(3.) The next two cases belong to the telomere config- 
uration corresponding to Fig. [l] to the right with l\ = 
and £2 = L — £. This case involves a 3- vertex instead 
of a 4-vertex, and has only one loose end-segment. The 
number of configurations the telomere configuration can 
assume is 



(20) 




where cr 3 ~ -0.18 and a x = (7 - l)/2 ~ 0.08 (see the 
appendix). Let us first calculate the entropy loss in the 
small loop limit £ <C L — £. Here, the linear chain part 
should essentially behave like a simple linear chain, which 
implies X te io{x) = cx~ 3l,+(73 ~' a ' 1 for x <C 1 and thus 



= A t<Ao cn L {L-iy- 1 r* v+a s-^ 



(21) 



Figure 2: DNA loop condensation. The circles in the original 
DNA double-helix denote likely contact points. Formation of 
bonds between these contacts with one common agglomera- 
tion centre, as indicated by the dashed lines, result in the 
locus configuration on the right. Note the reduction in the 
gyration radius during this process. A higher-order vertex is 
created at the locus point 



where c is an universal number. 

The corresponding condition for the bond enthalpy 
reads 



IV. MULTIPLE LOOPING IN A CIRCULAR 
DNA CHAIN 
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(22) 



Taking a loop of 2.3 kbp in a chain of length 3500 kbp and 
neglecting the (non-universal) first term in the square 
brackets gives 

Atfbond < -6.3 k B T = -15.8 kJ/mol = -3.8 kcal/mol. 

(23) 

For comparison, if the loop size is 230 kbp, this value is 
increased to AiJbond < —9.3 kcal/mol. 

(4.) If the loop size and the linear chain segment are 
of equal size, £ = L/2, the configuration number becomes 



^telo — At e l Xt e l (l) (J, h [ — 



and we obtain the condition 

Aelo ^tclo(l) 



(24) 



Affl 



bond 



< k B T 



Am 



-(3^-a 3 )ln^-2_lln(2L) 



(25) 



Taking a chain length of 460 kbp and neglecting the (non- 
universal) first term in the square brackets we find 

AiJbond < -15.8 k B T = -39.3 kJ/mol = -9.4 kcal/mol. 

(26) 



Assume that m potential connector points are dis- 
tributed evenly along a circular DNA chain of total length 
L. If these condense to form a common locus, a num- 
ber m of loops of equal size are created which are held 
together at this locus, as sketched in Fig. ^ fl2[] . This 
creates, in the scaling limit, a high-order vertex where 
2m legs are joined. The procedure for the configuration 
number for this locus configuration yields 



^locus ^Mocus fJ> I 

m 



-3771^+0-27, 



(27) 



where the universal exponent a^m is associated with a 
vertex with 2m outgoing legs (see the appendix). It 
should be noted that this result holds true only if the 
size of the locus is much smaller than the sizes of the 
created loops 

Due to the assumption that all m loops are of the same 
size, we immediately arrive at 



bond 



< k B T 



m locus -)- 3^(1 — m) In L 



+3mzy In m + In — 
m 



(28) 



The absolute value of uim increases rapidly with increas- 
ing m, and can be determined from Pade or Pade-Borel 
analysis as shown in reference p4fl . We list the topolog- 
ical exponents up to order 8 in the appendix. Taking a 
circular chain of 3500 kbp and m = 4, and neglecting the 
(non-universal) first term in the square brackets, we find 
that the entropy loss is fairly high (using erg = —2.4), 



Affbond < -67.4 k B T= -168kJ/mol 



-40.1 kcal/mol. 

(29) 
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In this case, the contribution due to the <7g term is as 
large as 50% of the total entropy loss. 



V. CONCLUSIONS 

We have presented an analytical method to estimate 
the entropy loss in different scenarios of DNA looping in 
the limit of long segments. This approach takes explicitly 
the self-avoidance and interacting nature of the formed 
loops and other segments into account, and considers the 
additional effect of vertex formation, i.e., the effective in- 
teraction between different segments at the point where 
they are joined. This is possible via the scaling theory for 
arbitrary polymer networks derived by Duplantier. The 
obtained numbers do not vary much, as due to the loga- 
rithmic dependence on the segment sizes. However, they 
are all non-negligible, and therefore have to be compen- 
sated by the released bond energy on formation of the 
DNA loop. We noted that the entropy loss is of the same 
order or close to the bond melting energy required for 
splitting an AT or GC bond, i.e., a considerable amount. 
Moreover, it is to be expected that the vertex effect in- 
creases the characteristic bond formation times in ana- 
lytical approaches which are based on the free energy. 

Our calculations are valid in the long chain limit. In 
units of the monomer size of a typical DNA double-helix 
persistence length l p ~ 100 bp, a minimum number of at 
least 10 monomers is expected to be required to consider 
a segment in the final structure flexible. For shorter seg- 
ments, additional effects due to bending and twisting en- 
ergy are expected to become relevant. As the mentioned 
examples document, there are numerous systems, both 
in vivo and in vitro, in which the flexibility conditions is 
easily fulfilled, and in which our estimation method for 
the entropy loss becomes fully applicable. The persis- 
tence length of single-stranded DNA and RNA is much 
shorter, typically taken to be of the order £ p ~ 8 bases. 
Thus, in single strand looping experiments the expected 
entropy loss will be considerably larger. 
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Appendix A: CONFIGURATION EXPONENTS 
FOR A GENERAL POLYMER NETWORK 



A general polymer network Q like the one depicted in 
Fig. H consists of vertices which are joined by M chain 
segments of lengths si, . . . , s^f. Their total length be 
L = Yli=i s i- I n the scaling limit Si 3> 1, the number of 




Figure 3: Polymer network Q with vertices (•) of different 
order (ni = 5, 713 = 4, 714 = 3, ns = 1). 



N 


<Tjv Pade 


<tjv Pade-Borel 


3 


-0.19 


-0.19 


4 


-0.48 


-0.49 


5 


-0.86 


-0.87 


6 


-1.33 


-1.29 


7 


-1.88 


-1.75 


8 


-2.51 


-2.23 


9 


-3.21 


-2.71 



erjv Pade-Borel 



-0.17 
-0.47 
-0.84 



Table I: Topological exponents ujv for network vertices of 
order N in 3D from various approximate techniques. The 
columns correspond to the first three columns in Table 1 in 
pi[ , where the scaling relation — 1 = gn + Na\ (with 7f 
from [^J) and o\ = (7 — l)/2 with the best available value 
7 = 1.1575 |^| have been used. Note the large discrepancy 
between the different methods for increasing N. 



configurations of such a network is given by 



— Ag \x 



'AT 



(Al) 



where Ag is a non-universal amplitude, /i is the effective 
connectivity constant for self-avoiding walks, and yg is a 
scaling function. The topology of the network is reflected 
in the configuration exponent 



7g = 1 — 3v£ + 2_. n N°~N 
N>1 



(A2) 



C = Y^n>i(N — 2) n N /2 + 1 is the Euler number of inde- 
pendent loops, i>n is the number of N- vertices, and a\\- 
is an exp onent connected to an A^-vertex. Thus, expres- 



sion (Al) generalises the familiar form u> ~ fi L 1 of a 



linear polymer chain. The numerical values we use in the 
text are given in table [| for the topological exponents cat; 
furthermore, we employ v — 0.588 and 7 ~ 1.16 [^O], p9| . 
We also make use of the relation 7 = 2o\ + 1 . 

Note that in this work we consider the entropy of a 
given polymer network, in which enters the total number 
of physically distinct configurations. Two configurations 
are considered distinct if they cannot be superimposed by 



6 



translation. In particular, the monomers of the chain are 
distinguishable. For a simple ring of length L this implies 
that two configurations are distinct even if they have the 
same trajectory, but differ from each other by a reptation 
(translation of the chain within the trajectory) by a non- 
integer multiple of L. The number of configurations of 
the simple ring is therefore |2^] 

w circ =DL~ L~ 3v (A3) 

where ui ~ L~ 5v ~ l is the number of configurations of a 
ring polymer with indistinguishable monomers. Likewise, 



w c ; rc corresponds to the number of closed random walks 
of length L which start and end at a given point in space 
(compare also the first reference 

The number of configurations of a looped structure 
(with a least one vertex) is also given by equation (jAJj). 
This is due to the fact that the established looping bond 
is chemically fixed within the chain, so that the chain 
cannot reptate within a given trajectory. For the same 
reason (and in contrast to references [50|), different 
loops cannot exchange length with each other. 
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